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Abstract 

A method is given that "inverts" a logic grammar and displays it from 
the point of view of the logical form, rather than from that of the 
word string. LR-compiling techniques are used to allow a recursive- 
descent generation algorithm to perform "functor merging" much in 
the same way as an LR parser performs prefix merging. This is an 
improvement on the semantic-head-driven generator that results in 
a much smaller search space. The amount of semantic lookahead can 
be varied, and appropriate tradeoff points between table size and 
resulting nondeterminism can be found automatically. This can be 
done by removing all spurious nondeterminism for input sufficiently 
close to the examples of a training corpus, and large portions of it 
for other input, while preserving completeness. 1 

1 Introduction 

With the emergence of fast algorithms and optimization techniques for syn- 
tactic analysis, such as the use of explanation-based learning in conjunction 
with LR parsing, see (Samuelsson & Rayner 1991) and subsequent work, 
surface generation has become a major bottleneck in NLP systems. Surface 
generation will here be viewed as the inverse problem of syntactic analysis 
and subsequent semantic interpretation. The latter consists in constructing 
some semantic representation of an input word-string based on the syntac- 
tic and semantic rules of a formal grammar. In this article, we will limit 
ourselves to logic grammars that attribute word strings with expressions in 
some logical formalism represented as terms with a functor-argument struc- 
ture. The surface generation problem then consists in assigning an output 

T wish to thank greatly Gregor Erbach, Jussi Karlgren, Manny Rayner, Hans Uszko- 
reit, Mats Wiren and the anonymous reviewers of ACL, EACL, IJCAI and RANLP for 
valuable feedback on previous versions of this article. Special credit is due to Kristina 
Striegnitz, who assisted with the implementation. 
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work was funded by the N3 "Bidirektionale Linguistische Deduktion (BiLD)" project in 
the Sonderforschungsbereich 314 KiinstUche IntelUgenz — Wissensbasierte Systeme. 
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word-string to such a term. This is a common scenario in conjunction 
with for example transfer-based machine-translation systems employing re- 
versible grammars, and it is different from that when a deep generator or 
a text planner is available to guide the surface generator. In general, both 
these mappings are many-to-many: a word string that can be mapped to 
several distinct logical forms is said to be ambiguous. A logical form that 
can be assigned to several different word strings is said to have multiple 
paraphrases. 

We want to create a generation algorithm that generates a word string 
by recursively descending through a logical form, while delaying the choice 
of grammar rules to apply as long as possible. This means that we want to 
process different rules or rule combinations that introduce the same piece of 
semantics in parallel until they branch apart. This will reduce the amount 
of spurious search, since we will gain more information about the rest of 
the logical form before having to commit to a particular grammar rule. 
In practice, this means that we want to perform "functor merging" much 
in the same ways as an LR parser performs prefix merging by employing 
parsing tables compiled from the grammar. One obvious way of doing this 
is to use LR-compilation techniques to compile generation tables. This will 
however require that we reformulate the grammar from the point of view of 
the logical form, rather than from that of the word string from which it is 
normally displayed. 

The rest of the paper is structured as follows: We will first review ba- 
sic LR compilation of parsing tables in Section 2. The grammar-inversion 
procedure turns out to be most easily explained in terms of the semantic- 
head-driven generation (SHDG) algorithm. We will therefore proceed to 
outline the SHDG algorithm in Section 3. The grammar inversion itself is 
described in Section 4, while LR compilation of generation tables is dis- 
cussed in Section 5. The generation algorithm is presented in Section 6. 
The example-based optimization technique turns out to be most easily ex- 
plained as a straight-forward extension of a simpler optimization technique 
predating it, why this simpler technique is given in Section 7. This exten- 
sion is described in Section 8 and the relation between this example-based 
optimization technique and explanation-based learning is discussed in Sec- 
tion 9. 
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2 LR Compilation for Parsing 

LR compilation in general is well-described in for example (Alio et al. 
1986:215-247). Here we will only sketch out the main ideas. 

An LR parser is basically a pushdown automaton, i.e., it has a pushdown 
stack in addition to a finite set of internal states and a reader head for 
scanning the input string from left to right one symbol at a time. The 
stack is used in a characteristic way: The items on the stack consist of 
alternating grammar symbols and states. The current state is simply the 
state on top of the stack. The most distinguishing feature of an LR parser 
is however the form of the transition relation — the action and goto tables. 
A nondeterministic LR parser can in each step perform one of four basic 
actions. In state S with lookahead symbol 2 Sym it can: 

1. accept: Halt and signal success. 

2. error: Fail and backtrack. 

3. shift S2- Consume the input symbol Sym, push it onto the stack, and 
transit to state S2 by pushing it onto the stack. 

4. reduce R: Pop off two items from the stack for each grammar symbol 
in the RHS of grammar rule R, inspect the stack for the old state S\ 
now on top of the stack, push the LHS of rule R onto the stack, and 
transit to state S2 determined by goto(S\,LHS,S2) by pushing S2 onto 
the stack. 

Consider the small sample grammar given in Figure 1. To make this 
simple grammar slightly more interesting, the recursive Rule 1, S — >■ S QM , 
allows the addition of a question mark (QM) to the end of a sentence (S), 
as in John sleeps?. The LHS S is then interpreted as a yes-no question 
version of the RHS S. 

Each internal state consists of a set of dotted items. Each item in turn 
corresponds to a grammar rule. The current string position is indicated by 
a dot. For example, Rule 2, S -> NP VP, yields the item S => NP . VP, 
which corresponds to just having found an NP and now searching for a VP. 

In the compilation phase, new states are induced from old ones: For the 
indicated string position, a possible grammar symbol is selected and the 
dot is advanced one step in all items where this particular grammar symbol 

2 The lookahead symbol is the next symbol in the input string, i.e., the symbol under 
the reader head. 
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Fig. 1. Sample grammar 



immediately follows the dot, and the resulting new items will constitute the 
kernel of the new state. Non-kernel items are added to these by selecting 
grammar rules whose LHS match grammar symbols at the new string posi- 
tion in the new items. In each non-kernel item, the dot is at the beginning 
of the rule. If a set of items is constructed that already exists, then this 
search branch is abandoned and the recursion terminates. 



State 1 

. s 

S . S QM 

S . NP VP 

State 2 

s . 

S S . QM 

State 3 

S ^ NP . VP 
VP ^ . VP PP 
VP ^ , VP AdvP 
VP . Vi 
VP . V t NP 



State 4 

S ^ NP VP . 
VP ^ VP . PP 
VP ^ VP . AdvP 
PP =>. . P NP 

State 5 

VP ^ VP PP . 

State 6 

VP =>■ VP AdvP . 
State 7 

PP =>. P . NP 



State 8 

PP =>. P NP . 

State 9 
VP ^ V • 

State 10 

VP V t . NP 

State 11 

VP ^V t NP. 

State 12 

S S QM . 



Fig. 2. LR-parsing states for the sample grammar 



The state-construction phase starts off by creating an initial set consist- 
ing of a single dummy kernel item and its non-kernel closure. This is State 1 
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in Figure 2. The dummy item introduces a dummy top grammar symbol as 
its LHS, while the RHS consists of the old top symbol, and the dot is at the 
beginning of the rule. In the example, this is the item S 1 =>■ • S. The rest 
of the states are induced from the initial state. The states resulting from 
the sample grammar of Figure 1 are shown in Figure 2, and these in turn 
will yield the parsing tables of Figure 3. The entry "s3" in the action ta- 
ble, for example, should be interpreted as "shift the lookahead symbol onto 
the stack and transit to State 3". The entry "r7" should be interpreted as 
"reduce by Rule 7". The accept action is denoted "acc" . The goto entries, 
like "g4", simply indicate what state to transit to once a nonterminal of 
that type has been constructed. 
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Fig. 3. LR-parsing tables for the sample grammar 



In conjunction with grammar formalisms employing complex feature 
structures, this procedure is associated with a number of interesting prob- 
lems, many of which are discussed in (Nakazawa 1991) and (Samuelsson 
1994c). For example, the termination criterion must be modified: If a new 
set of items is constructed that is more specific than an existing one, then 
this search branch is abandoned and the recursion terminates. If, on the 
other hand, it is more general, then it replaces the old one. 
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3 The Semantic-Head-Driven Generation Algorithm 

Generators found in large-scale systems such as the DFKI DISCO sys- 
tem (Uszkoreit et al 1994), or the SRI Core Language Engine (Alshawi 
(ed.) 1992:268-275), tend typically to be based on the semantic-head-driven 
generation (SHDG) algorithm. The SHDG algorithm is well-described in 
(Shieber et al 1990); here we will only outline the main features. 

The grammar rules of Figure 1 have been attributed with logical forms 
as shown in Figure 4. The notation has been changed so that each con- 
stituent consists of a quadruple [Cat, Sem, Wo, Wi), where Wo and W\ form 
a difference list representing the word string that Cat spans, and Sem is the 
logical form. For example, the logical form corresponding to the LHS S of 
the (S>od(X,Y), Wo, W) (S^X, Wo, Wi) (gM,Y,W l5 W) rule, consists of 
a modifier Y added to the logical form X of the RHS S . As we can see from 
the last grammar rule, this modifier is in turn realized as ynq. 



S>od(X,Y),W ,W) -> (S,X,W ,W 1 ) (QM,Y,W U W) 1 

S^WcW) -> (WP,X,Wo,Wi> (VP^Y.W^W) 2 

VT,X-mod(Y,Z), Wo, W) {VP,Jn,W ,W x ) {AdvP,Z,W u W) 3 

VT,X-mod(Y,Z), Wo, W) ( VP, X~Y, W , W x ) (PP,Z,W l5 W) 4 

VT,X,W ,W) -> (K-,X,W ,W) 5 

VT,Y,W ,W) -> (^,X%Wo,W!) (NP,X,W U W) 6 

PP,Y,W ,W) -> (P,X%lf ,^) (NP,X,W U W) 7 



(NP,johii,[John\W],W) John 

\7\mary. . I /«/// 11.11 Mary 
(AP, paris, [Pan's |W], W) — > Paris 
( Vi, X~sleep(X), [s/eeps|W], W) — > s/eeps 
(V t ,Xirsee(X,Y), [see|W], W) sees 
(P,X~in(X), HW],W) m 
(AdvP, today, [^orfaj/|W], W) — > tof/aj/ 
<QM,ynq,[?|W],W> -> ? 

Fig. 4. Sample grammar with semantics 



For the SHDG algorithm, the grammar is divided into chain rules and 
non-chain rules: Chain rules have a distinguished RHS constituent, the se- 
mantic head, that has the same logical form as the LHS constituent, modulo 
A-abstractions; non-chain rules lack such a constituent. In particular, lexi- 
con entries are non-chain rules, since they do not have any RHS constituents 
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at all. This distinction is made since the generation algorithm treats the 
two rule types quite differently. In the example grammar, rules 2 and 5 
through 7 are chain rules, while the remaining ones are non-chain rules. 

A simple semantic-head-driven generator might work as follows: Given 
a grammar symbol and a piece of logical form, the generator looks for a 
non-chain rule with the given semantics. The constituents of the RHS of 
that rule are then generated recursively, after which the LHS is connected 
to the given grammar symbol using chain rules. At each application of a 
chain rule, the rest of the RHS constituents, i.e., the non-head constituents, 
are generated recursively. The particular combination of connecting chain 
rules used is often referred to as a chain. The generator starts off with the 
top symbol of the grammar and the logical form corresponding to the string 
that is to be generated. 

The inherent problem with the SHDG algorithm is that each rule com- 
bination is tried in turn, while the possibilities of prefiltering are rather 
limited, leading to a large amount of spurious search. The generation al- 
gorithm presented in the current article does not suffer from this problem; 
what the new algorithm in effect does is to process all chains from a partic- 
ular set of grammar symbols down to some particular piece of logical form 
in parallel before any rule is applied, rather than to construct and try each 
one separately in turn. 

4 Grammar Inversion 

Before we can invert the grammar, we must put it in normal form. We will 
use a variant of chain and non-chain rules, namely functor-introducing rules 
corresponding to non-chain rules, and argument-filling rules corresponding 
to chain rules. The inversion step is based on the assumption that there are 
no other types of rules. 

Since the generator will work by recursive descent through the logical 
form, we wish to rearrange the grammar so that arguments are generated 
together with their functors. To this end we introduce another difference list 
A and A to pass down the arguments introduced by argument-filling rules 
to the corresponding functor-introducing rules. Here the latter rules are 
assumed to be lexical, following the tradition in GPSG where the presence 
of the SUBCAT feature implies a preterminal grammar symbol, see e.g. 
(Gazdar et al. 1985:33), but this is really immaterial for the algorithm. 

The grammar of Figure 4 is shown in normal form in Figure 5. The 
grammar is compiled into this form by inspecting the flow of arguments 
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Functor-introducing rules 

(S,mod(X,Y),W ,W,e,e) -> (S, X, W , W u e, e) (QM,Y, W x , W, e, e) 1 
(VP,rmod(Y,Z),W ,W,Ao,A) 3 

( VT, X~Y, W , WuAq, A) (AdvP, Z, W u W, e, e) 
(VP,rmod(Y,Z),W ,W,Ao,A) 4 

( VT, X~Y, W , A , A) <PP, Z, e, e> 

(NP,2oim,[John\W],W,A,e) A 
(iVP,mary, [Marj/|l^],l^,A,e) A 
(iVP,paris, [Pan's|W], W, A, e) A 
(Vi,X~sleep(X), [s/eeps | A, e) A 
(V^,X~Y~see(X,Y), [see|W], W, A, e) A 
(P,X-in(X),[m|^],Py,A,e) A 
(AefcP, today, [today\W], W } A, e) A 
(gM,ynq,[?|^],^,A, e ) -> A 

Argument-filling rules 

(S,Y,W ,W,t,t) -> (FP,X%m,^,[(iVP,X,^o,m)],e) 2 

(VP,X,W ,W,Ao,A) -> (V,-,X,Wo,W,A ,A) 5 

(FP,Y,^o,^,A ,A) -> (^,X%^ ,m,[(iVP,X,m,^)|Ao],A) 6 

(PP,Y,^o,^,A ,A) -> (P,X%^ ,m,[(iVP,X,m,^)|Ao],A) 7 

Fig. 5. Sample grammar in normal form 



through the logical forms of the constituents of each rule. In the functor- 
introducing rules, the RHS is rearranged to mirror the argument order of the 
LHS logical form. The argument-filling rules have only one RHS constituent 
— the semantic head — and the rest of the original RHS constituents are 
added to the argument list of the head constituent. Note, for example, how 
the NP is added to the argument list of the VP in Rule 2, or to the argument 
list of the P in Rule 7. This is done automatically, although currently, the 
exact flow of arguments is specified manually. 

We assume that there are no purely argument-filling cycles. For rules 
that actually fill in arguments, this is obviously impossible, since the num- 
ber of arguments decreases strictly. For the slightly degenerate case of 
argument-filling rules which only pass along the logical form, such as the 
( VP } X) — y (Vi, X) rule, this is equivalent to the off-line parsability require- 
ment, (Kaplan & Bresnan 1982:264-266). 3 We require this in order to avoid 



3 If the RHS Vi were a VP, we would have a purely argument-filling cycle of length 1. 
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an infinite number of chains, since each possible chain will be expanded out 
in the inversion step. Since subcategorization lists of verbs are bounded in 
length, PATR II style VP rules do not pose a serious problem, which on the 
other hand the "adjunct-as-argument" approach taken in (Bouma & van 
Noord 1994) may do. However, this problem is common to a number of 
other generation algorithms, including the SHDG algorithm. 

Let us return to the scenario for the SHDG algorithm given at the end 
of Section 3: We have a piece of logical form and a grammar symbol, and 
we wish to connect a non-chain rule with this particular logical form to the 
given grammar symbol through a chain. We will generalize this scenario 
just slightly to the case where a set of grammar symbols is given, rather 
than a single one. 

Each inverted rule will correspond to a particular chain of argument- 
filling (chain) rules connecting a functor-introducing (non-chain) rule in- 
troducing this logical form to a grammar symbol in the given set. The 
arguments introduced by this chain will be collected and passed down to 
the functors that consume them in order to ensure that each of the inverted 
rules has a RHS matching the structure of the LHS logical form. The nor- 
malized sample grammar of Figure 5 will result in the inverted grammar 
of Figure 6. Note how the right-hand sides reflect the argument structure 
of the left-hand-side logical forms. As mentioned previously, the collected 
arguments are currently assumed to correspond to functors introduced by 
lexical entries, but the procedure can readily be modified to accommodate 
grammar rules with a non-empty RHS, where some of the arguments are 
consumed by the LHS logical form. 

The grammar inversion step is combined with the LR-compilation step. 
This is convenient for several reasons: Firstly, the termination criteria and 
the database maintenance issues are the same in both steps. Secondly, 
since the LR-compilation step employs a top-down rule-invocation scheme, 
this will ensure that the arguments are passed down to the corresponding 
functors. In fact, invoking inverted grammar rules merely requires first 
invoking a chain of argument-filling rules and then terminating it with a 
functor-introducing rule. 

5 LR Compilation for Generation 

Just as when compiling LR-parsing tables, the compiler operates on sets of 
dotted items. Each item consists of a partially processed inverted grammar 
rule, with a dot marking the current position. Here the current position is 
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(S,mod(X,Y),W ,W,e,e) -> 

(S,X,W ,W 1 ,e,e) (QM,Y,^,^ £ , e ) 
(S,mod(Y,Z),W ,W,e,e) -> 

( VT, XT, W 1? W 2 , [(WP, X, W , W 1 )],e) (AdvP, Z, W 2 , W, e, e) 
(S,mod(Y,Z),W ,W,e,e) -> 

( VT, XT, ^2, [(^P, X, W , W 1 )],e) (PP, Z, W 2 , W, e, e) 
( VP, X~mod(Y, Z), W, W, [(iVP, X, Wo, W 1 )],e) -> 

( VT, XT, ^2, [(iVP, X, Wo, W 1 )],e) (AdvP, Z, W 2 , W, e, e) 
( VP, X-mod(Y, Z), W, W, [(iVP, X, Wo, W 1 )],e) -> 

( VT, XT, W 1? W 2 , [(iVP, X, W , W 1 )],e) (PP, Z, W 2 , W, e, e) 
(S, sleep(X), Wo, W, e, e) -> (iVP, X, W , [sleeps] W], e, e) 
( VP, X-sleep(X), [sleeps\W],W, [{NP, X, W , [sleeps\W])],e) 

(WP,X, Wo, [s/eeps|^],e,e) 
(5 , ,see(X,Y),Wo,l^,e,e) (iVP, X, W x , W, e, e) (NP, Y, Wo, [sees\W\], e, e) 

(VP,Y^see(X,Y),[sees\W ],W,[(NP,Y,W 1 ,[sees\Wo])],e) 

(iVP, X, Wo, W, e, e) (NP, Y, W t , [sees\W ],e, e) 
(PP,X~in(X),[in\Wo],W,€,e} (iVP, X, W , W, e, e) 

(iVP, john, [Jo/in|W],W,e,e) e 
(iVP,mary, [Man/|W], W, e, e) e 
(iVP,paris, [Pans|W],W,e,e) e 
(yl(/t;P, today, [today\W], W, e, e) e 
(gM,ynq,[?|W],W, e , e ) -> e 

Fig. 6. Inverted sample grammar 



an argument position of the LHS logical form, rather than some position in 
the input string. 

New states are induced from old ones: For the indicated argument po- 
sition, a possible logical form is selected and the dot is advanced one step 
in all items where this particular logical form can occur in the current ar- 
gument position, and the resulting new items constitute a new state. All 
possible grammar symbols that can occur in the old argument position and 
that can have this logical form are then collected. From these, all rules with 
a matching LHS are invoked from the inverted grammar. Each such rule 
will give rise to a new item where the dot marks the first argument position, 
and the set of these new items will constitute another new state. If a new 
set of items is constructed that is more specific than an existing one, then 
this search branch is abandoned and the recursion terminates. If it on the 
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other hand is more general, then it replaces the old one. 

The state-construction phase starts off by creating an initial set con- 
sisting of a single dummy item with a dummy top grammar symbol and a 
dummy top logical form, corresponding to a dummy inverted grammar rule. 
In the sample grammar, this would be the rule (S", f (X), Wo, W, e, e) — > 
{S, X, Wo, W, e, e). The dot is at the beginning of the rule, selecting the 
first and only argument. The rest of the states are induced from this one. 
The first three states resulting from the inverted grammar of Figure 6 are 
shown in Figure 7, where the difference lists representing the word strings 
are omitted. 

State 1 

(5",f(X),e,e) •(S,lL,e,e) 

State 2 

(S>od(X,Y),e, e) . {S, X, e, e) ( QM , Y, e, e) 

(S>od(Y,Z),e, e) . ( VP, XT, [{NP, X)], e) {AdvP, Z, e, e) 

<S>od(Y,Z), e , e > .{VP,TY,[{NP,t)],e){PP,Z,e,e) 

State 3 

(S>od(X,Y),e, e) . {S, X, e, e) ( QM , Y, e, e) 

(S>od(Y,Z),e, e) . ( VP, XT, [{NP, X)], e) {AdvP, Z, e, e) 

(^modC^Z)^^) .(VT,X-Y,[(^P,X)], e )(PP,Z, e ,e) 
(FP,X^mod(Y,Z), [(iVP,X)],e) ^ • ( VT, XT, [(A^P, X)], e) {AdvP, Z, e, e) 
(FP,Xmod(Y,Z),[(iVP,X)], e ) .(VP,rY,[(NP,X)],e)(PP,Z,e,e) 

Fig. 7. The first three generation states 

The sets of items are used to compile the generation tables in the same 
way as is done for LR parsing. The goto entries correspond to transiting 
from one argument of a term to the next, and thus advancing the dot one 
step. The reductions correspond to applying the rules of items that have 
the dot at the end of the RHS, as is the case when LR parsing. There is 
no obvious analogy to the shift action — the closest thing would be the 
descend actions transiting from a functor to one of its arguments. 

Note that there is no need to include the logical form of each lexicon 
entry in the generation tables. Instead, a typing of the logical forms can 
be introduced, and a representative of each type used in the actual tables, 
rather than the individual logical forms. This decreases the size of the 
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tables drastically. For example, there is no point in distinguishing the states 
reached by traversing john, mary and paris, apart from ensuring that the 
correct word is added to the output word-string. This is accomplished much 
in the same way as preterminals, rather than individual words, figure in LR- 
parsing tables. 

6 The Generation Algorithm 

The generator works by recursive descent through the logical form while 
transiting between the internal states. It is driven by the descend, goto and 
reduce tables. A pushdown stack is used to store intermediate constituents. 
When generating a word string, the current state and logical form deter- 
mine a transition to a new state, corresponding to the first argument of the 
logical form, through the descend table. A substring is generated recur- 
sively from the argument logical form, and this constituent is pushed onto 
the stack. The argument logical form, together with the new current state, 
determine a transition to the next state through the goto table. The next 
state corresponds to the next argument of the original logical form, and 
another substring is generated from this argument logical form, etc. When 
no more arguments remain, an inverted grammar rule is selected nondeter- 
ministically by the reduce table and applied to the top portion of the stack, 
constructing a word string corresponding to the original logical form and 
completing this generation cycle. 4 

We now turn to optimizing the generation tables. 

7 Optimizing the Generation Tables 

The basic idea underlying the optimization technique presented in this ar- 
ticle is to remove as much nondeterminism from the generation tables as 
possible. One problem is that it may be impossible to remove all nondeter- 
minism for the simple reason that the current piece of logical form may in 
fact allow multiple paraphrases. In this case, we say that we have "real" 
nondeterminism. On the other hand, it may be the case that although lo- 
cally, several alternatives are possible, subsequent generation may rule out 
all but one of them. We will call this "spurious" nondeterminism. 

4 This is a bottom-up rule invocation scheme. It could easily be modified so that a rule 
is instead applied before constructing the substrings recursively, resulting in a top-down 
rule-invocation scheme. 
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Due to the grammar inversion, and the way the sets of items are con- 
structed, all LHS logical forms of the items in some particular state will 
be the same, and will thus have equal arity. Thus, there will be nothing 
analogous to shift-reduce conflicts in the resulting generation tables, only 
reduce-reduce conflicts. This means that the latter is the sole source of 
nondeterminism, and that this will arise only in states with more than one 
possible reduction. By inspecting the number of items left in each "reduc- 
tive state", i.e., each state where the dot is at the end of the rules, we can 
determine whether or not the generation tables will be deterministic. 

The logical form can be inspected down to an arbitrary depth of recur- 
sion when compiling the sets of items, and this parameter can be varied. 
This is closely related to the use of lookahead symbols in an LR parser; 
increasing the depth is analogous to increasing the number of lookahead 
symbols. The amount of semantic lookahead will be reflected in the goto 
and descend table entries. No semantic lookahead would mean only taking 
the functor of the logical form into consideration, and in the example above, 
a typical action table entry would be descend (1 ,mod(_ , _) , 2) . 5 This would 
mean that the generator would operate on State 2 of Figure 7 when gener- 
ating from the first argument of the mod/2 term, and both the S alternative 
and the (merged) VP alternative(s) would be attempted nondeterministi- 
cally. 

By taking the arguments of the logical form into account, the degree of 
nondeterminism can be reduced, and for the grammar given in Figure 1, it is 
eliminated completely. In the example, if the second argument of the mod/2 
term is ynq, then only the S alternative will be considered when generating 
from the first argument, since the relevant descend entries and states will 
be those of Figure 8. The optimal depth may vary for each individual table 
entry, and even within it, and a scheme has been devised to automatically 
find such an optimum. 

Assuming that it is actually possible to construct fully deterministic gen- 
eration tables by filtering on a large enough amount of semantic lookahead, 
the problem reduces to for each table entry finding a lookahead depth that 
will result in only one single remaining item in each reductive state. This 
is in fact a stronger requirement than that all nondeterminism be spuri- 
ous: It may be the case that for each possible logical form, it is possible 
to determine the appropriate reduction by a sufficient amount of semantic 
lookahead, but due to potentially infinite recursion, no preassigned limit on 



5 Here "_" denotes a don't-care variable. 
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descend(l, mod(mod(_ , _) ,ynq) , 2k). 
descend(l, mod(see(_,_) ,ynq) , 2B) . 
descend(l, mod(sleep(_) ,ynq) , 2C) . 

State 2A 

(S, mod(mod(X, Y), ynq), e, e) => • (S, mod(X, Y), e, e) ( QM , ynq, e, e) 
State 2B 

(S, mod(see(X, Y), ynq), e, e) => • (S, see(X, Y), e, e) ( QM , ynq, e, e) 

State 2C 

(5, mod(sleep(X), ynq), e, e) => • (S, sleep(X), e, e) ( QM, ynq, e, e) 

Fig. 8. Alternative generation states 

it will do. This is elaborated in the following section. 

The scheme employs iterative deepening. It tries to construct fully de- 
terministic tables by first allowing a total amount of semantic lookahead 
of one, then of two, etc., up to some maximum limit. This is however not 
done globally, but at each recursive call to the sets-of-items construction 
step, when a piece of logical form and a set of grammar symbols are used 
to invoke new inverted grammar rules to construct new sets of items. 

At this point, the total amount of available lookahead is distributed 
through the arguments of the functor of the current piece of logical form, 
and then further down to the arguments of the arguments, etc., until all has 
been used up. The current sets of items are then tentatively constructed. 
Increased semantic-lookahead depth will split potential nondeterminism in 
the resulting reductive states into distinct sets of items, and thus into dis- 
tinct reductive states with less nondeterminism, or preferably, with no non- 
determinism at all. If the resulting reductive states are all deterministic, 
then this particular semantic-lookahead setting is used to compile the actual 
generation tables, and the scheme recurses. In more detail, a set of terms 
mirroring the various ways of assigning semantic lookahead are generated 
and ordered according to how much lookahead they use up. The first one 
to yield fully deterministic reductive states is used when constructing the 
actual tables and is passed down in the recursion. 

In the running example, the first argument of mod/2 contributes no 
important information when descending from State 1, while the second one 
does. The scheme correctly finds the optimal depths when transiting from 
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descend(l, mod(_,ynq), 2). 

State 2 

(5 , ,mod(X,ynq), e, e) . (S,X, e, e) (QM,ynq, e, e) 

Fig. 9. Alternative alternative generation states 



Since the scheme employs iterative deepening, this will guarantee that 
locally, no alternative table entries can inspect a smaller portion of the 
logical forms and still be deterministic, given the previous choices of seman- 
tic lookahead. This is a greedy algorithm, and it could potentially be the 
case that another choice of semantic lookahead would lead to less required 
lookahead in total by reducing that of the table entries generated in later 
recursion steps. 

8 An Example-Based Optimization Technique 

The optimization scheme as described so far is limited to grammars without 
real nondeterminism that only have removable spurious nondeterminism. A 
simple way of extending this to more general grammars is to introduce a 
second outer level of iterative deepening controlling the amount of nonde- 
terminism tolerated in each recursive call to the sets-of-items construction 
step. First, we try to construct generation tables with only one reduction in 
each reductive state. If this proves impossible within the maximum amount 
of total semantic lookahead allowed, we try to construct tables with at most 
two reductions in each resulting reductive state, etc. Since there is a finite 
number of inverted grammar rules, and thus a finite number of possible 
items, this process will terminate. Again, this optimization is done locally 
at each recursive call to the sets-of-items construction step. 

A problem with this approach is that the number of possible ways of as- 
signing semantic lookahead increases drastically with the amount of looka- 
head allowed, and some heuristics are needed to direct the search. We will 
shortly describe a method that constructs more fine-tuned generation ta- 
bles by using training examples to guide the search; to determine how much 
real nondeterminism there is at each point that cannot be removed; and 
to find appropriate lookahead depths that will remove all spurious nonde- 
terminism on the training corpus. First, we will however examine spurious 
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nondeterminism a bit closer. 

Assume that we add the following grammar rules for handling NPs with 
internal structure: 6 

<AP,q(X,Y)> -> (Det,X) (N , Y) 
<A,X> -> <A,X> 
<A,mod(X,Y)> -> (N , Y) <A,X> 
(A,mod(X,Y)) (TP,Y) (A,X) 

(TP,mod(X,Y) (A,Y) (TP,X) 

<TP,X> -> <T,X> 

This will allow derivations like that of Figure 10. Here APoNB reads "Adjec- 
tive phrase or N-bar" . This in turn will allow constructing logical forms like 




Nl N2 Nn AoN NO 

Fig. 10. A sample derivation 

mod (NO, mod (mod (. . .mod (AoN, Nn) , . . . ,N2) ,N1)). To determine which of 
the rules (N, mod(X, Y)) (N , Y) (N, X) and (N , mod(X, Y)) (TP, Y) (A, X) 
to apply, we must inspect the hrst argument AoN — verb or noun — of the 
innermost mod/2 term, which may be arbitrarily deeply nested. Although 
this will never introduce multiple paraphrases, it does allow spurious non- 
determinism that cannot be handled by a bounded amount of semantic 
lookahead. 

A highly respectable objection to the presented example is that, apart 
from the proposed treatment of noun-noun and noun-adjective compounds 
being linguistically somewhat dubious, we will in practice never see cases 
where we need a very large amount of semantic lookahead. Precisely this 



6 Again, the difference lists representing the word strings have been omitted 
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is one of the two corner stones on which the example-based optimization 
technique presented in this section rests. The other one is the observation 
that a lower bound of the amount of real nondeterminism can easily be 
established for each (portion of a) training example, while it is in the general 
case difficult to do this directly from the grammar. 

Thus, the training examples are used for three purposes: Firstly, to limit 
the search to search branches that are relevant for input data that actually 
occur in real life. Secondly, to establish the minimum amount of nonde- 
terminism at each point, i.e., the amount of real nondeterminism at this 
point that cannot be removed by greater lookahead depth. Thirdly, to find 
appropriate lookahead depths that will remove all spurious nondeterminism 
at each point in the training example. 

The generation tables are constructed much in the same way as in the 
previous section. The main difference is that instead of aiming at full de- 
terminism, the target nondeterminism is the real nondeterminism at each 
point of each training example. In more detail, a set of terms mirroring the 
various ways of assigning semantic lookahead are generated from the set 
of training examples, and they are ordered according to how much looka- 
head they employ. Intuitively, a (sub)term is constructed from each training 
example by replacing parts of it with free variables, thus removing the infor- 
mation contained in these parts of the training example, and the subterms 
are merged to form one term. Thus, terms employing more lookahead will 
contain more detailed information from the set of training examples. The 
first term to yield as deterministic reductive states as the one correspond- 
ing to the set of whole training examples, where no information has been 
blocked out by variables, is used for constructing the actual tables and is 
passed down in the recursion. 

A technical complication is that the training examples interact with the 
termination criteria of the sets-of-items construction step: Although a new 
set of items may be more specific than an old one, it may stem from more de- 
manding training examples. In the current version of the scheme, this would 
result in recompiling the sets of items from the earliest point where too sim- 
ple examples were used, this time including the more demanding examples. 
To handle input outside the training corpus, a default lookahead depth is 
assigned to the possible continuations that are not encountered among the 
training examples. This means that the resulting generation tables preserve 
completeness and are guaranteed to be optimal, modulo the limitations of 
greedy algorithms, for input sufficiently similar to combinations of examples 
in the training corpus, but not necessarily for other input. 
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The degree of generalization is considerable: To return to the running 
example of the nondeterminism in State 2 discussed above, a single training 
example like (a logical form corresponding to) John sleeps? or Mary sees 
a house in Paris will remove all nondeterminism in this state. In general, 
the table size seems to increase moderately with the number of training 
examples due to the good degree of generalization, although this needs to 
be more thoroughly investigated. 

The modified algorithm for including the training examples into the 
LR-compilation algorithm is guaranteed to terminate if the original LR- 
compilation algorithm terminates. The worst-case complexity is however 
not very good. However, for the grammars and training sets tested this 
far, processing efficiency is not a problem, though we can envision that for 
considerably larger grammars and training sets, there will be a need for 
optimizing the optimization procedure further. 

9 Discussion 

The new generation algorithm constitutes an improvement on the semantic- 
head-driven generation algorithm that allows "functor merging", i.e., en- 
ables processing various grammar rules, or rule combinations, that intro- 
duce the same semantic structure simultaneously, thereby greatly reducing 
the search space. The algorithm proceeds by recursive descent through the 
logical form, and using the terminology of the SHDG algorithm, what the 
new algorithm in effect does is to process all chains from a particular set 
of grammar symbols down to some particular piece of logical form in par- 
allel until a reduction is attempted, rather than to construct and try each 
one separately in turn. This requires a grammar-inversion technique that 
is fundamentally different from techniques such as the essential-argument 
algorithm, see the following, since it must display the grammar from the 
point of view of the logical form, rather than from that of the word string. 
LR-compilation techniques accomplish the functor merging by compiling 
the inverted grammar into a set of generation tables. 

The grammar inversion rearranges the grammar as a whole according 
to the functor- argument structure of the logical forms. Other inversion 
schemes, such as the essential-argument algorithm (Strzalkowski 1990) or 
the direct-inversion approach (Minnen et al. 1995), are mainly concerned 
with locally rearranging the order of the RHS constituents of individual 
grammar rules by examining the flow of information through these con- 
stituents, to ensure termination and increase efficiency. Although this can 
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occasionally change the set of RHS symbols in a rule, it is done to these 
ends, rather than to reflect the functor-argument structure. 

Although the sample grammar used throughout the article is essentially 
context-free, there is nothing in principle that restricts the method to such 
grammars. In fact, the method could be extended to grammars employing 
complex feature structures as easily as the LR-parsing scheme itself, see for 
example (Nakazawa 1991), and this is currently being done. Some hand 
editing is necessary when preparing the grammar for the inversion step, 
but it is limited to specifying the flow of arguments in the grammar rules. 
Furthermore, this could potentially be fully automated. 

The set of applicable reductions can be diminished by resorting to deeper 
semantic lookahead, at the price of a larger number of internal states, and 
there is in general a tradeoff between the size of the resulting generation 
tables and the amount of nondeterminism when reducing. The employed 
amount of semantic lookahead can be varied, and a scheme has been de- 
vised and tested that automatically determines appropriate tradeoff points, 
optionally based on a collection of training examples. 

The latter version of the scheme turns out to be related to explanation- 
based learning (EBL) which has proved quite successful for optimizing LR- 
parsing tables for syntactic analysis. There, the basic idea is to learn spe- 
cial grammar rules from the original ones and a set of training examples by 
chunking together the former based on how they are used to parse the lat- 
ter. The relevant references are (Samuelsson & Rayner 1991), (Samuelsson 
1994a) and (Neumann 1994). 

Rayner and Samuelsson basically trade coverage for speed and accuracy 
by using the training examples to compile a new grammar that is used in- 
stead of the original one. Their problem is that the underlying NL systems 
that they work on employ find-all parsing strategies and subsequent selec- 
tion of the preferred analysis. This makes it very difficult to integrate the 
learned grammar with the original one without losing all processing speed 
gained. Neumann strives for a very close integration between the learned 
and original grammars by falling back to the original grammar when pro- 
cessing with the learned grammar alone has proved insufficient. He utilizes 
the fact that his original system employs a best-first parsing strategy, which 
allows intelligent reuse of partial results from the attempt to parse with the 
learned grammar. 

Another problem that has not previously been satisfactorily resolved is 
how to determine the degree of generalization of the examples, or viewed 
from another point of view, how to chunk together the original grammar 
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rules. Rayner and Neumann hand-code special meta-rules, so-called op- 
erationally criteria, for this based on linguistic intuition. These criteria 
are then refined manually by experimentation. Samuelsson offers an auto- 
matic method for doing this that relates the desired coverage to the way 
the examples are generalized (Samuelsson 1994b). This quantity is however 
only indirectly related to the actual performance of the system using the 
resulting learned grammar. 

In contrast to this, the method described in the current article auto- 
matically preserves completeness; achieves fully seamless integration, since 
there is only one processing mode; and automatically determines the degree 
of generalization by minimizing a quantity that has a profound direct influ- 
ence on the resulting performance, namely the amount of nondeterminism in 
each reductive state. It would be very interesting to see if this idea could be 
carried over to syntactic parsing by manipulating the number of lookahead 
symbols to minimize the number of shift-reduce and reduce-reduce conflicts 
in the resulting LR parsing tables. 

The method has been implemented and applied to more complex gram- 
mars than the simple one used as an example in this article, and it works 
excellently. Although these grammars are still too naive to form the basis 
of a serious empirical evaluation lending substantial experimental support 
to the method as a whole, it should be obvious from the algorithm itself 
that the reduction in search space compared to the SHDG algorithm is most 
substantial. Nonetheless, such an evaluation is a top-priority item on the 
future-work agenda. 
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